Goto

Collaborating Authors

 factor vector


SOFARI-R: High-Dimensional Manifold-Based Inference for Latent Responses

Zheng, Zemin, Zhou, Xin, Lv, Jinchi

arXiv.org Machine Learning

Data reduction with uncertainty quantification plays a key role in various multi-task learning applications, where large numbers of responses and features are present. To this end, a general framework of high-dimensional manifold-based SOFAR inference (SOFARI) was introduced recently in Zheng, Zhou, Fan and Lv (2024) for interpretable multi-task learning inference focusing on the left factor vectors and singular values exploiting the latent singular value decomposition (SVD) structure. Yet, designing a valid inference procedure on the latent right factor vectors is not straightforward from that of the left ones and can be even more challenging due to asymmetry of left and right singular vectors in the response matrix. To tackle these issues, in this paper we suggest a new method of high-dimensional manifold-based SOFAR inference for latent responses (SOFARI-R), where two variants of SOFARI-R are introduced. The first variant deals with strongly orthogonal factors by coupling left singular vectors with the design matrix and then appropriately rescaling them to generate new Stiefel manifolds. The second variant handles the more general weakly orthogonal factors by employing the hard-thresholded SOFARI estimates and delicately incorporating approximation errors into the distribution. Both variants produce bias-corrected estimators for the latent right factor vectors that enjoy asymptotically normal distributions with justified asymptotic variance estimates. We demonstrate the effectiveness of the newly suggested method using extensive simulation studies and an economic application.


Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Andriy Mnih, Yee W. Teh

Neural Information Processing Systems

User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user's item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data.


Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Neural Information Processing Systems

User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user's item selection process. In the interests of scalability, we restrict our attention to treestructured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data.


Predicting Sparse Clients' Actions with CPOPT-Net in the Banking Environment

Charlier, Jeremy, State, Radu, Hilger, Jean

arXiv.org Machine Learning

The digital revolution of the banking system with evolving European regulations have pushed the major banking actors to innovate by a newly use of their clients' digital information. Given highly sparse client activities, we propose CPOPT-Net, an algorithm that combines the CP canonical tensor decomposition, a multidimensional matrix decomposition that factorizes a tensor as the sum of rank-one tensors, and neural networks. CPOPT-Net removes efficiently sparse information with a gradient-based resolution while relying on neural networks for time series predictions. Our experiments show that CPOPT-Net is capable to perform accurate predictions of the clients' actions in the context of personalized recommendation. CPOPT-Net is the first algorithm to use non-linear conjugate gradient tensor resolution with neural networks to propose predictions of financial activities on a public data set.


Implementing libFM in Keras (IT Best Kept Secret Is Optimization)

#artificialintelligence

I just won a gold medal on Talking Data competition on Kaggle, finishing 6th. My approach and solution is described here. The part that triggered most interest from readers is where I used matrix factorization techniques to generate additional features. Before that, let me briefly explain what this competition was about. To support your modeling, they have provided a generous dataset covering approximately 200 million clicks over 4 days!


Hankel Matrix Nuclear Norm Regularized Tensor Completion for $N$-dimensional Exponential Signals

Ying, Jiaxi, Lu, Hengfa, Wei, Qingtao, Cai, Jian-Feng, Guo, Di, Wu, Jihui, Chen, Zhong, Qu, Xiaobo

arXiv.org Machine Learning

Signals are generally modeled as a superposition of exponential functions in spectroscopy of chemistry, biology and medical imaging. For fast data acquisition or other inevitable reasons, however, only a small amount of samples may be acquired and thus how to recover the full signal becomes an active research topic. But existing approaches can not efficiently recover $N$-dimensional exponential signals with $N\geq 3$. In this paper, we study the problem of recovering N-dimensional (particularly $N\geq 3$) exponential signals from partial observations, and formulate this problem as a low-rank tensor completion problem with exponential factor vectors. The full signal is reconstructed by simultaneously exploiting the CANDECOMP/PARAFAC structure and the exponential structure of the associated factor vectors. The latter is promoted by minimizing an objective function involving the nuclear norm of Hankel matrices. Experimental results on simulated and real magnetic resonance spectroscopy data show that the proposed approach can successfully recover full signals from very limited samples and is robust to the estimated tensor rank.


Recommending music on Spotify with deep learning

#artificialintelligence

In this post, I'll explain my approach and show some preliminary results. This is going to be a long post, so here's an overview of the different sections. If you want to skip ahead, just click the section title to go there. Traditionally, Spotify has relied mostly on collaborative filtering approaches to power their recommendations. The idea of collaborative filtering is to determine the users' preferences from historical usage data. For example, if two users listen to largely the same set of songs, their tastes are probably similar. Conversely, if two songs are listened to by the same group of users, they probably sound similar. This kind of information can be exploited to make recommendations.


Sparse Probabilistic Matrix Factorization by Laplace Distribution for Collaborative Filtering

Jing, Liping (Beijing Key Lab of Traffic Data Analysis and Mining and Beijing Jiaotong University) | Wang, Peng (Beijing Key Lab of Traffic Data Analysis and Mining and Beijing Jiaotong University) | Yang, Liu (Beijing Key Lab of Traffic Data Analysis and Mining and Beijing Jiaotong University)

AAAI Conferences

In recommendation systems, probabilistic matrix factorization (PMF) is a state-of-the-art collaborative filtering method by determining the latent features to represent users and items. However, two major issues limiting the usefulness of PMF are the sparsity problem and long-tail distribution. Sparsity refers to the situation that the observed rating data are sparse, which results in that only part of latent features are informative for describing each item/user. Long tail distribution implies that a large fraction of items have few ratings. In this work, we propose a sparse probabilistic matrix factorization method (SPMF) by utilizing a Laplacian distribution to model the item/user factor vector. Laplacian distribution has ability to generate sparse coding, which is beneficial for SPMF to distinguish the relevant and irrelevant latent features with respect to each item/user. Meanwhile, the tails in Laplacian distribution are comparatively heavy, which is rewarding for SPMF to recommend the tail items. Furthermore, a distributed Gibbs sampling algorithm is developed to efficiently train the proposed sparse probabilistic model. A series of experiments on Netfilix and Movielens datasets have been conducted to demonstrate that SPMF outperforms the existing PMF and its extended version Bayesian PMF (BPMF), especially for the recommendation of tail items.


Learning Label Trees for Probabilistic Modelling of Implicit Feedback

Mnih, Andriy, Teh, Yee W.

Neural Information Processing Systems

User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user's item selection process. In the interests of scalability, we restrict our attention to tree-structured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data.


Learning Item Trees for Probabilistic Modelling of Implicit Feedback

Mnih, Andriy, Teh, Yee Whye

arXiv.org Machine Learning

User preferences for items can be inferred from either explicit feedback, such as item ratings, or implicit feedback, such as rental histories. Research in collaborative filtering has concentrated on explicit feedback, resulting in the development of accurate and scalable models. However, since explicit feedback is often difficult to collect it is important to develop effective models that take advantage of the more widely available implicit feedback. We introduce a probabilistic approach to collaborative filtering with implicit feedback based on modelling the user's item selection process. In the interests of scalability, we restrict our attention to tree-structured distributions over items and develop a principled and efficient algorithm for learning item trees from data. We also identify a problem with a widely used protocol for evaluating implicit feedback models and propose a way of addressing it using a small quantity of explicit feedback data.